-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
feat(aci): Database-tracked coordinated task-based DetectorGroup backfill #102371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
This PR has a migration; here is the generated SQL for for --
-- Create model ErrorBackfillStatus
--
CREATE TABLE "workflow_engine_error_backfill_status" ("id" bigint NOT NULL PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY, "date_updated" timestamp with time zone NOT NULL, "date_added" timestamp with time zone NOT NULL, "status" varchar(20) NOT NULL, "detector_id" bigint NOT NULL UNIQUE);
ALTER TABLE "workflow_engine_error_backfill_status" ADD CONSTRAINT "workflow_engine_erro_detector_id_6e5eb8d9_fk_workflow_" FOREIGN KEY ("detector_id") REFERENCES "workflow_engine_detector" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
ALTER TABLE "workflow_engine_error_backfill_status" VALIDATE CONSTRAINT "workflow_engine_erro_detector_id_6e5eb8d9_fk_workflow_";
CREATE INDEX CONCURRENTLY "workflow_engine_error_backfill_status_status_3d9773bb" ON "workflow_engine_error_backfill_status" ("status");
CREATE INDEX CONCURRENTLY "workflow_engine_error_backfill_status_status_3d9773bb_like" ON "workflow_engine_error_backfill_status" ("status" varchar_pattern_ops);
CREATE INDEX CONCURRENTLY "errbkfl_stat_upd_idx" ON "workflow_engine_error_backfill_status" ("status", "date_updated"); |
|
This issue has gone three weeks without activity. In another week, I will close it. But! If you comment or otherwise update it, I will reset the clock, and if you remove the label "A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀 |
49c2c5b to
af4e036
Compare
This approach allows the backfill to be run to completion and tracked without creating significant task backlog.
The first phase is a one-time status row creation, which is run in a task loop.
From there, we can start triggering the coordinator periodically and having it manage scheduling tasks for work items, first very slowly to verify, then in increasing volume.
We can ensure that the capacity cost of this backfill is relatively fixed regardless of processing rate, and failed tasks can naturally be rescheduled without them starving out possibly successful tasks.
This PR is structured as a framework and a job using it even though we may never actually need to reuse the framework because this model makes it easier to review the job management and the error backfill elements separately; not having the separation is less code, but ultimately a larger conceptual chunk. By separating it, we also have the option of doing more bulk processing of this sort.
Process-wise, we'd:
Once done, we can delete and drop the table, or we can leave it for reuse.